Efficient MXNet sampling in the multinomial distribution #15311

zixuanweeei · 2019-06-21T10:34:17Z

Description

This PR has solved the problem of low efficiency to sample from the multinomial distribution, especially the distribution with a large number of possible outcomes. As it was described in #15231, it costs ~44 s using mx.nd.random.multinomial, while np.random.multinomial costs ~0.019 s. After our optimization, it decreases to ~0.054 s.

Feature changes

New features

Efficient sampling strategy for the multinomial distribution using binary search.
Use double type to define some cumulative variables for high precision with very small probabilities.
Reserve the CDF array for sampling many times without duplicated addition operations.

Performance

We compared the time costs of mx.nd.random.multinomial on branch master(b8b352d) and our PR shown in the table below. The result shows that the time cost of the original sampling strategy rises much more rapidly than that of our PR with the increasing number of outcomes.

Number of outcomes	Master(b8b352d) Costs (ms)	PR(`6c3c49b`) Costs (ms)
10	0.125	0.066
100	0.270	0.058
1000	0.729	0.130
10000	62.717	0.996
100000	5372.306	12.125
1000000	-	173.534

Comments

@pengzhao-intel @ciyongch @TaoLv Please help me refine this PR and have some review on it. Thanks.

Check list

Passed code style checking (cpplint).
Unit test passed.
Code is well-documented.

pengzhao-intel · 2019-06-21T11:40:11Z

@reminisce @wkcn for review.

TaoLv · 2019-06-21T15:22:45Z

Nice work! @zixuanweeei
@chinakook @stu1130 You might be interested.

wkcn

Great work. LGTM. Thank you!

stu1130

Brillant! LGTM

chinakook · 2019-06-22T05:50:47Z

Thanks, LGTM!

wkcn · 2019-06-22T06:26:41Z

Merged. Thanks for your contribution!

zixuanweeei added 4 commits June 17, 2019 20:45

Effective multinomial

c08ca90

Meaningful uniform data pointer as input

a5484ac

Remove beginning Zeros from CDFs

5540eec

Double precision for accumulated var

6c3c49b

wkcn approved these changes Jun 21, 2019

View reviewed changes

stu1130 approved these changes Jun 21, 2019

View reviewed changes

wkcn added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Jun 21, 2019

wkcn merged commit e6fad30 into apache:master Jun 22, 2019

roywei mentioned this pull request Jun 25, 2019

[CI] nightly failure test on amp tutorial #15355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient MXNet sampling in the multinomial distribution #15311

Efficient MXNet sampling in the multinomial distribution #15311

zixuanweeei commented Jun 21, 2019 •

edited

Loading

pengzhao-intel commented Jun 21, 2019

TaoLv commented Jun 21, 2019

wkcn left a comment

stu1130 left a comment

chinakook commented Jun 22, 2019

wkcn commented Jun 22, 2019

Efficient MXNet sampling in the multinomial distribution #15311

Efficient MXNet sampling in the multinomial distribution #15311

Conversation

zixuanweeei commented Jun 21, 2019 • edited Loading

Description

Feature changes

New features

Performance

Comments

Check list

pengzhao-intel commented Jun 21, 2019

TaoLv commented Jun 21, 2019

wkcn left a comment

Choose a reason for hiding this comment

stu1130 left a comment

Choose a reason for hiding this comment

chinakook commented Jun 22, 2019

wkcn commented Jun 22, 2019

zixuanweeei commented Jun 21, 2019 •

edited

Loading